Thesis Proposal Approximate Dynamic Programming Using Bellman Residual Elimination

نویسندگان

  • Brett Bethke
  • Jonathan P. How
  • Dimitri Bertsekas
  • Asuman Ozdaglar
چکیده

The overarching goal of the thesis is to devise new strategies for multi-agent planning and control problems, especially in the case where the agents are subject to random failures, maintenance needs, or other health management concerns, or in cases where the system model is not perfectly known. We argue that dynamic programming techniques, in particular Markov Decision Processes (MDPs), are a natural framework for addressing these planning problems, and present an MDP problem formulation for a persistent surveillance mission that incorporates stochastic fuel usage dynamics and the possibility for randomly-occurring failures into the planning process. We show that this problem formulation and its optimal policy lead to good mission performance in a number of realworld scenarios. Furthermore, an on-line, adaptive solution framework is developed that allows the planning system to improve its performance over time, even in the case where the true system model is uncertain or time-varying. Motivated by the difficulty of solving the persistent mission problem exactly when the number of agents becomes large, we then develop a new family of approximate dynamic programming algorithms, called Bellman Residual Elimination (BRE) methods, which can be employed to approximately solve large-scale MDPs. We analyze these methods and prove a number of desirable theoretical properties about them, including reduction to exact policy iteration under certain conditions. Finally, we apply these BRE methods to large-scale persistent surveillance problems and show that they yield good performance, and furthermore, that they can be successfully integrated into the adaptive planning framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Difference of Convex Functions Programming for Reinforcement Learning

Large Markov Decision Processes are usually solved using Approximate Dynamic Programming methods such as Approximate Value Iteration or Approximate Policy Iteration. The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming. To do so, we study the minimization of a norm of th...

متن کامل

Should one minimize the Bellman residual or maximize the mean value?

This paper aims at theoretically and empirically comparing two standard optimization criterion for Reinforcement Learning: i) maximization of the mean value (predominant approach in policy search algorithms) and ii) minimization of the Bellman residual (mainly used in approximate dynamic programming). For doing so, we introduce a new policy search algorithm based on the minimization of the resi...

متن کامل

Robust Value Function Approximation Using Bilinear Programming

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, this approach provably finds an approximate value function that minimizes the Bellman residual. Sol...

متن کامل

Approximate dynamic programming via direct search in the space of value function approximations

This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks to minimize the span s...

متن کامل

Robust Approximate Bilinear Programming Robust Approximate Bilinear Programming for Value Function Approximation

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008